Terminology Evolution in Web Archiving: Open Issues

نویسندگان

  • Nina Tahmasebi
  • Tereza Iofciu
  • Thomas Risse
  • Claudia Niederée
  • Wolf Siberski
چکیده

The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for effective retrieval technology. However, as terminology is evolving over time, a growing gap opens up between older documents in (long-term) archives and the active language used for querying such archives. Thus, technologies for detecting and systematically handling terminology evolution are required to ensure “semantic” accessibility of (Web) archive content on the long run. As a starting point for dealing with terminology evolution this paper formalizes the problem and discusses issues, first ideas and relevant technologies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Terminology Evolution Module for Web Archives in the LiWA Context∗

More and more national libraries and institutes are archiving the web as a part of the cultural heritage. As with all long term archives, these archives contain text and language that evolves over time. This is particularly true for web archives as content published online is highly dynamic and changing at a fast rate. The language evolution causes gaps between the terminology used for querying...

متن کامل

Data Integration for Open Data on the Web

In this lecture we will discuss and introduce challenges of integrating openly available Web data and how to solve them. Firstly, while we will address this topic from the viewpoint of Semantic Web research, not all data is readily available as RDF or Linked Data, so we will give an introduction to different data formats prevalent on the Web, namely, standard formats for publishing and exchangi...

متن کامل

eScience and archiving for space science

Our scientific meetings, IT publications, and even the media are awash with new terminology and possibilities about how “a new age has dawned in scientific and engineering research” made possible through distributed science collaborations enabled by the internet, viz., eScience, the grid, cyberinfrastructure, and virtual observatories. All of these new structures and tools depend critically on ...

متن کامل

Political Terms by APLL: Issues of Terminology Implantation and ‎Acceptability

The present study investigates the implantation of political science terminology approved by the Academy of Persian Language and Literature (APLL) in the Hamshahri corpus made up of news text from Hamshahri newspaper and their acceptability among MA students of English translation studies (ETS), English literature (EL), and Political science (PS). To conduct this research the frequencies of the...

متن کامل

The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative

The Web-at-Risk project is a multi-year National Digital Information Infrastructure and Preservation Program (NDIIPP) funded effort to enable librarians and archivists to capture, curate, and preserve political and government information on the Web, and to make the resulting Web archives available to researchers. The Web-at-Risk project is a collaborative effort between the California Digital L...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008